Sprinkled Latent Semantic Indexing for Text Classification with Background Knowledge
نویسندگان
چکیده
In text classification, one key problem is its inherent dichotomy of polysemy and synonym; the other problem is the insufficient usage of abundant useful, but unlabeled text documents. Targeting on solving these problems, we incorporate a sprinkling Latent Semantic Indexing (LSI) with background knowledge for text classification. The motivation comes from: 1) LSI is a popular technique for information retrieval and it also succeeds in text classification solving the problem of polysemy and synonym; 2) By fusing the sprinkling terms and unlabeled terms, our method not only considers the class relationship, but also explores the unlabeled information. Finally, experimental results on text documents demonstrate our proposed method benefits for improving the classification performance.
منابع مشابه
Evaluation of Background Knowledge for Latent Semantic Indexing Classification
This paper presents work that evaluates background knowledge for use in improving accuracy for text classification using Latent Semantic Indexing (LSI). LSI’s singular value decomposition process can be performed on a combination of training data and background knowledge. Intuitively, the closer the background knowledge is to the classification task, the more helpful it will be in terms of crea...
متن کاملImproving Text Classification with LSI Using Background Knowledge
We present work in progress that uses Latent Semantic Indexing (LSI) in conjunction with background knowledge and unlabeled examples to improve text classification accuracy. The singular value decomposition (SVD) that is performed by LSI is done on an expanded term by document matrix that includes the labeled training examples as well as the unlabeled examples. We report classification accuracy...
متن کاملIntegrating Background Knowledge into Nearest-Neighbor Text Classification
This paper describes two different approaches for incorporating background knowledgeinto nearest-neighbor text classification.Our first approachuses backgroundtext to assessthe similarity betweentraining and test documentsrather than assessing their similarity directly. The second method redescribes examples using Latent Semantic Indexing on the background knowledge, assessing document similari...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملSupport Vector Machines for Text Categorization Based on Latent Semantic Indexing
Text Categorization(TC) is an important component in many information organization and information management tasks. Two key issues in TC are feature coding and classifier design. In this paper Text Categorization via Support Vector Machines(SVMs) approach based on Latent Semantic Indexing(LSI) is described. Latent Semantic Indexing[1][2] is a method for selecting informative subspaces of featu...
متن کامل